Variant Discovery ◾ 153
Ready-built human databases are available at ANNOVAR server, UCSC Genome Browser
website, or third parties and can be downloaded using “annotate_variation.pl” program.
For the non-human species which have no available annotation databases, a database can
be built from the FASTA sequence of the reference genome and the GFF/GTF annota-
tion file of that species. Those two files can be downloaded from databases such as NCBI
Genome database or UCSC database.
As shown in Table 4.4, ANNOVAR consists of six Perl files that can be used as com-
mand-line programs on any computer with Perl installed. The download instructions are
available at “https://annovar.openbioinformatics.org/en/latest/user-guide/download/”.
You may be asked to register with your school email. The download link will be emailed to
you, and then you can download the compressed file onto your computer and decompress
it with “tar xvf” command. If you are using Linux, you can add ANNOVAR to the path by
adding the following line to the end of “.bashrc” file:
Export PATH=”YOURPATH/annovar:$PATH”
4.3.3.1 Annotation Databases
For variant annotation, ANNOVAR uses annotation databases of an organism to be down-
loaded in a directory. Databases can be downloaded from UCSC Genome Browser, 1000
genome project or ANNOVAR website, or from a third-party URL. You can use “anno-
tate_variation.pl” to annotate, download a database, or list the available databases for a
specific build. The general syntax is as follows:
annotate_variation.pl \
[arguments] \
<query-file|table-name> \
<database-location>
For the complete list of argument run:
annotate_variation.pl -h
To list the available annotation databases for the hg19 build of the human reference genome,
you can run the following command:
TABLE 4.4 ANOVAR Script Files
ANNOVAR Program
Description
annotate_variation.pl
The core ANNOVAR program for annotation and database download
coding_change.pl
To calculate the mutated sequence and make inference
convert2annovar.pl
To convert genotype-calling file format into ANNOVAR input format
retrieve_seq_from_fasta.pl
To retrieve genomic nucleotide, cDNA sequences, or translated amino acid
sequences from FASTA file
table_annovar.pl
To generate a tab-delimited output file with annotation columns
variants_reduction.pl
For prioritizing causal variants